by KC Harris, submitted May 2022 for Legal Studies 123 @ UC Berkeley
In 2018, the Center for Police Equity released a report on the Berkeley Police Department saying that there were racial disparities in arrest rates between white and BIPOC subjects. Their study found that people of color were 4.5x-6.5x more likely to be stopped than white citizens, 4.5x-20x more likely to be searched per capita, and 2x as likely to be arrested overall. The report caused major discussion of polcing in the city of Berkeley, and one of the organizations requests was that the city collect more race specific data in their reports moving forward, particularly surrounding the perceived race of the subject and if their race had been perceived prior to the stop.
Since the time of this report, the city has begun collecting RIPA-compliant stops data (September 2020 - Present) and very recently (Latest as of May 2022) approved a batch of departmental and municipal changes that will radically alter how traffic stops are handled in the city of Berkeley. There is a unique opportunity here to conduct a more pointed analysis of stops and race, both in comparison to the broader non-RIPA stops data of the past, and before/in preparation for the very different data that will follow re-organization of Berkeley transportation laws and their enforcement. We will look to see if those specific racial disparities actually do exist with the better data, and set up opportunities for clear comparisons as future policies change.
The direct questions of this analysis are:
The stops data used in this project was collected by the city of Berkeley Police Department from 2015-2022 and downloaded in March of 2022 from its publicly available open access data portal. The data comes in two formats: RIPA and Non-RIPA Compliant data. All data since October of 2020 is RIPA-Compliant and as a result provides increased insights. Non-RIPA Compliant data has been kept to attempt to comprehensively represent the city, but some variables have been changed to match with new RIPA terminology, and certain models may vary in observation size due to lacking shared features between the two datasets. This is unfortunate, but some key assumptions can still be tested regardless of the differences in data.
Combined, there are approximately 64k features, and 8 shared features. For the Smaller dataset (RIPA-compliant only) there are approximately 8k observations and 45 features.
Long-term crime and arrest data is also not directly available through the city and will be left out of this analysis. While it is still beneficial to look at stops to analyze police activity, it’s important to clarify the difference between stops and arrests, and admit that while comprehensive, the presently available stops data do not paint the whole picture. Inferences made in previous reports surrounding the likelihood of being stopped cannot be directly equated to the likelihoods of being arrested post-stop, even if post-stop data can still reveal racial disparities.
Overall features included:
learecordid, incidentnumber, dateofstop,
timeofstop, durationofstop, city, lat, long,
raceperceivedpriortostop, perceivedraceorethnicity,
perceivedgender, perceivedage, reasonforstop,
reasonforstopnarrative, resultofstop, personnumber,
isstopmadeinresponsetocallforservice, informationbasedstop,
typeofstop, officertypeofassignment, location,
islocationak12publicschool, ifk12schoolisstopofastudent,
schoolname, educationcodesection, educationcodesubdivision,
perceivedgendernonconforming, islgbt,
personhadlimitedornoenglishfluency, perceivedorknowndisability,
cityofresidence, trafficviolationtype,
trafficviolationoffensecodes, suspicionoffensecode,
suspicionsubtype, actionstaken, basisforsearch,
basisforsearchnarrative, basisforpropertyseizure,
typeofpropertyseized, contrabandorevidence, othercontrabanddesc,
warningoffensecodes, citationoffensecodes,
infieldciteandreleaseoffensecodes, custodialarrestoffensecodes
And added census features included:
censustract,
tract_medianincome, tract_totalpop, tract_whitepop,
tract_nonwhitecomp, tract_poccomp, tract_nonwhitepop,
tract_pocpop, tract_aapop, tract_na_aipop, tract_aisianpop,
tract_hawaiian, tract_mixed2, tract_totalnumstops,
tract_annualstops, tract_distancefromcal
For variables like local median income or residential racial composition, data was pulled from the 2020 US Census on the census website and then joined into the dataset by census tract. A census map of the city of Berkeley was available on the city's website in multiple formats, and all maps created in this project were modified versions of the .geoJSON files available there.
Before the modelling phase of this project, a few different forms of exploratory analysis were performed to understand what kind of data we were looking at. We'll break down some of that here, as well as link to some of the results.
Although there were many different categories for race provided by the data (especially in the RIPA data when the feature became 'perceived race'), we narrowed them down to five categories. This was appropriate not only because the focus of our analysis was primarily on White vs Black or Hispanic stop/arrest rates, but also because the five selected categories already made up the majority of observations. It appeared that officers most of the time only put down one race. Observations that had many different perceived ethnicities were put in the 'mixed' category.
To immediately examine if there were continuing disparities, we first looked at the percent of arrests in the dataset versus percent of residential population in berkeley for each race. Similar to the report, we found that Black subjects were a majority of those arrested despite being a substantially lower portion of the population of Berkeley. They were also arrested almost the same amount as White people, despite white people, despite white people making up over 60% of the residents in Berkeley. While this didn't prove anything yet, it was certainly an indicator that something was going on.
The next analyses looked at what kinds of stops we were observing. Stop type and result of stops were shared by both datasets, and were either labeled the same or not difficult to modify and join together.
The majority of stops in this dataset were traffic stops, and most of the stops resulted in citations or other non-arrest scenarios. This also gave us a better idea of the data - while it didn't invalidate our models at all, it was important to contextualize the data and understand that our models would lean towards being more indicative of disparities in traffic stops than in pedestrian stops. This was also a reminder that there were only a few thousand arrests in the dataset, and that we were looking at stop rather than crime data.
While there was initially concern that because there were so few arrests our models may not be accurate, we eventually concluded that this is not only the only data we have, but also still appropriate for a logistic regression, especially given that these are part of a much larger 5-year picture with many more observations.
We also generated a variety of maps to view the stops throughout the city, and get an immediate idea of where the most crime happened. Part of the CPE report had mentioned distance from the university as a possible variable in use of force incidents, and while we later found little to no relationship, creating the maps still proved insightful. An interactive version of the most recent map is available here, and has boundaries, overlays, and variable information about each census tract.
With our new knowledge that the majority of stops were traffic stops, it was easy to notice that this was reaffirmed by the maps. Upon closer inspection most of the the 'hottest' areas were along major streets like Shattuck or University, and otherwise seemed to clsuter immediately arround the university. But there was no clear relationship between being far from the university and having higher or lower stop rates, with a primary example being areas like the Marina or Northside and the Berkeley Hills.
Finally, to get an idea of our another key dependant variable available within the RIPA-compliant data, we created a a boxplot for duration of stops. The results of this were mixed: while Black subject stop durations had clearly larger IQR (interquartile range) and standard deviations, these were not much larger than Hispanic or White values and the outlier spread looked largely the same. What was valuable however was seeing how Asian stop durations were much lower than all other race groups - this hinted towards a similar pattern we would see later in our analysis.
For this analysis we used multiple logistic regression to create and compare multi-stage models for White vs Black, Hispanic, and BIPOC stop scenarios and predict the odds of arrest or longer stops based on these variables. In the larger models (RIPA and Non-RIPA) less variables were used (this is explained more above). But in the smaller models where RIPA data provided more insight, we were also able to include the race perceived prior to stop variable, which ultimately proved to be significant.
Traffic Stop (Categorical Option 1, True)
Tract Distance from Cal (Continuous Variable)
Race perceived prior to stop (Categorical Option 1, True)
Tract Distance from Cal (Continuous Variable)
We essentially compared the same scenario but with different race subjects: a traffic stop (with different races depending on models) in any tract of the city. This not only revealed how race affected likelihood of arrest under similar conditions, but also revealed if there were significant effects from tract variables in the first place. Other features (there were approximaltely 35 other features provided in the RIPA data) were removed from the model either because they were not relevant or they were to difficult to merge/categorize. Additionally, models were tested against four races:
Report Risk groups are a grouping of Black and Hispanic subjects made to broadly represent the two groups focused on the CPE report. We also tested for a BIPOC group in our analysis, but this risked affecting the results because of the major differences in duration and likelihood data for the Asian group. The output model summaries for the large models are printed below. If the results are not displaying properly, you can find all of the tables available here.
The larger model is used to broadly observe if race has a significant relationship with arrest, and if so, how the odds of arrest vary by race. Of course the argument could be made that race was recorded but the arrest occured for other reasons, but the broad analysis is still important to see the overall pattern suggested by the CPE report. The existance of the disparities here also lays the foundation for the more detailed racial comparisons made in the smaller models.
In this case, being White or Hispanic did not have a significant relationship with arrest (p>.05). Being Black or part of the Report Risk Groups had a very significant relationship with arrest (rounded p<.001). In those instances being Black reported a 1.58X odds ratio, and being part of the Report Risk Groups reported a 1.50X. With any OR above 1 suggesting that this variable makes the dependent variable more likely, this indicated that being part of these racial groups makes you much more likely to be arrested following a stop. The other racial variable, tract bipoc composition, proved to not be significant enough for consideration. This is interesting because in the smaller models, tract bipoc composition was very significant and had a large OR, suggesting that the later included perception of race was important as well. It's possibly worth noting that all p-values for BIPOC composition were between p > .05 and p > .10, but in this case it was simply too weak of a trend to confirm as relevant. If race perception data had been included in the larger past dataset, there's an argument to be made that this variable's significance would change.
The other tract variables either proved to not be significant, or if they were significant, to usually have an OR very close to 1 and therefore not affect the likelihood of arrest either way.
The smaller model was almost identical to the larger model above, but now included the perception of race variable provided by the RIPA-compliant data. It was a categorical variable, and in the scenario it was assumed that the race of the subject had been perceived prior to stop. The results of this are printed below, and also available for viewing here.
Here, the only race group that did not have a significant relationship with arrest is Hispanic. White had a p > .05, which meant that it was significant but almost not included. White's OR was .81X, suggesting that being White made subjects noticeably less likely to be arrested. The Black and Report Risk Groups variables were again very significant, echoing the previous larger models. In these however, the inclusion of the race perception variable also resulted in the race perception and tract BIPOC composition variables both being very significant in every scenario. The resulting odds for race perceived prior were consistently approximately 1.4X, and for tract BIPOC composition were again between 2.6X-2.8X. This was almost twice the odds ratio of both the race and race perception variables, suggesting that not only that tract BIPOC composition did have a significant relationship with arrest, but that knowing if the officer had perceived the subject's race prior was also a very important part of fully understanding how other variables contributed to disparities in stops.
The asian group was also analyzed because they made up a large portion of the subjects, even though they weren't the focus of the previous report. The results of the large and small model are available here. In both models being Asian had a very significant relationship to arrest, and reported odds of .22X in the larger model and .41X in the smaller model. These are extremely low, and reveal that even if race is perceived Asian subjects are much less likely to be arrested following a stop.
Similar to the previous larger models, the majority of tract variables either weren't significant, or did not have effective odds ratios. Interestingly, the tract distance from the university variable actually stayed significant for most models, with a high of p > .06 in the White model. It's OR stayed between 1.2X-1.3X, indicating that if race had been perceived, distance from the university did matter.
Our multiple logistic regressions analysis showed that the broad racial disparities revealed in previous reports do still exist, and that new data on police perception of race was in fact necessary to further confirm this. Black and and other Risk Group subjects are more likely to be arrested on race alone, and especially if their race was perceived prior to the stop.
The smaller models, although recent and containing fewer observations, echoed the findings of the larger models and additionally showed that if race had been perceived prior to a stop:
Overall, this reveals that police in the city of Berkeley are not only arresting marginalized groups at disproportionate rates, but also more specifically that race is a factor when it comes to determining if someone should be arrested.
Hopefully future policy changes in the city of Berkeley can prove to address some of these disparities. In late 2021, Berkeley City Council approved a package of policy changes proposed by a city working grouped launched by the mayor in 2020 composed of city officials, police, and community stakeholder organizations. Key changes include that police would no longer be able to conduct traffic stops for low-level traffic violations, and written consent would have to be obtained for a search to be conducted, suggesting that the city is trying to reduce not only the volume of discriminate stops/searches, but also make them safer when they do happen. More recent updates from the city include hints towards centralizing all transportation related work into one department and possibly including citizen volunteers (BerkDOT)), spending approximately $200k on municipal code and staffing analyses, and creating a new “specialized care unit” (SCU) to handle behavioral health crisis response. Implementation will likely take at least 2-3 years, although a pilot program for the SCU may be launched by late 2022. An updated timeline of these events is available here.
It's our hope that the analysis done here both proved the points of the CPE before there were substantial changes in the data, and also set up a baseline of likelihoods within RIPA-compliant data to observe how policy changes in berkeley impact stop subjects in the future. It will be interesting to continue analysis of the data as Berkeley moves into a drastically different municipal space, and to see what the data looks like as we move out of COVID-19.